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Prediction of electrical load is important because it relates to the source of 
power generation, cost-effective generation, system security, and policy on 
continuity of service to consumers. This paper uses Indonesian primary data 
compiled based on data log sheet per hour of transmission operators. In 
preprocessing data, detrending technique is used to eliminate outlier data in 
the time series dataset. The prediction used in this research is a 
long-short-term memory algorithm with stacking and time-step techniques. 
In order to get the optimal one-day forecasting results, the inputs are 
arranged in the previous three periods with 1, 2, 3 layers, 512 and 1024 
nodes. Forecasting results obtained long short-term memory (LSTM) with 
three layers and 1024 nodes got mean average percentage error (MAPE) of 
8.63 better than other models. 
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1. INTRODUCTION 

Electrical load forecasting in the electric power industry is important because it relates to the 
management of generating sources, cost-effective power system planning, system security, customer service 
and policy making [1], [2]. Electrical load forecasting research is based on past load patterns [3] which may 
reappear in the future. Load forecasting can be classified based on the length of the horizon and the method 
used. Load forecasting can be classified based on the length of the horizon and the method used [4]. Based on 
the length of the forecasting horizon [5], [6] long-term forecasts of | year or more, medium-term 1 week to | 
year, and short-term 1 hour to 1 week. Based on the load forecasting method, the statistical method is used 
with a mathematical approach based on correlation regression, while the machine learning and deep learning 
methods are built based on the training and testing process to train and test the performance of the model. 
The advantage of the deep learning model is that it is more efficient to train complex neural networks, with 
generalization capabilities that increase accuracy around 20-45% [7]. Among the three learning methods, 
deep learning techniques are the most successful methods in the field of image, text and data mining [8]. 

Several studies using deep learning methods for load forecasting include using the convolutional 
neural network (CNN) neural network method, where the CNN layer is used to extract features from 
historical loads in homogeneous residential load clustering [9], [10]. Recurrent neural network (RNN) 
architecture in load forecasting supports non-stationary discrete time input signals, making RNNs more 
suitable for sequential or sequential data [11]-[13]. Long-short term memory (LSTM) by combining 
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short-term memory with long-term memory, through gate control prevents signal loss during the prediction 
process, so as to a better accuracy [14]-[16]. Gate recurrent unit (GRU) is applied in the demand side energy 
forecasting which is still limited [17], predict electrical power load [18], shows the prediction performance of 
GRU is still lower than LSTM, but better than traditional models. 

Modeling in particular with time-series data sets using the LSTM technique is quite popular for 
solving complex sequence models such as electrical loads, by studying long-term dependencies and tracing 
patterns that occurred far in the past. Several studies using the LSTM algorithm have been developed for 
short-term electrical load forecasting and become the best algorithm for prediction for time series data [5], 
[16], [19]-[22]. According to [16], [21] the vital problem in forecasting models using the LSTM structure is 
to choose the right sequence of lag times along with the right hyperparameter model for certain time series 
data. Electric load prediction based on one day ahead [19] using supervised learning k-Means technique for 
the classification of consumer features based on the time sequence, namely adjacent times; adjacent day; and 
the same day of the adjacent week. The experimental result [19], [23] shows that the hyperparameter tuning 
of variations in the number of nodes in the parallel layer LSTM shows good accuracy performance, and 
overall LSTM performance can find energy consumption patterns from consumers. Kwon et al. [5] applying 
the short-term forecasting method by extracting features from historical data, with the separation of features, 
namely the day of the week, the average load of the last two days, the hourly load and the hourly temperature 
so as to get good accuracy for forecasting the electricity load one day ahead in Korea. The recent research 
[20] uses a half-hour period data set from 2008-2016 from metropolitan France for one-day forecasting. 
Bouktif et al. [20] discusses the optimal configuration of the LSTM model by tuning the hyperparameter of 
the number of layers, the number of neurons in each layer, lag time, batch size, and type of activation so as to 
reduce the complexity of the dataset used. Abdel-Nasser and Mahmoud [22] comparing the basic LSTM 
architecture, LSTM with sliding windows technique, LSTM with time steps and stacked LSTMs with other 
benchmark models in modeling temporal changes in photo voltaic (PV) output power per hour in Egypt. The 
architecture of all LSTMs outperforms benchmark models for one-hour predictions ahead, and LSTM 
architectures with time steps get the lowest error compared to all other LSTM architectures. 

The research in this paper proposes to predict the short-term load one day ahead with the stacking 
technique and the LSTM time step which is used for planning operations at the load control center one day 
ahead. This study uses a primary load dataset of Indonesia’s electrical energy consumption, which until now 
is still limited. The dataset is compiled from PT PLN’s daily logsheet based on the records of the 
transmission system operator every hour, for five years, 2013-2017. In preliminary research [24] our 
validation shows that LSTM-based forecasting models outperform other alternative approaches. Based on 
this preliminary research, we developed an LSTM algorithm by setting the optimal lag time and time step 
which is integrated with 3 layers LSTM stacking. To get the predictive performance of the electrical load, it 
is measured using the mean absolute error (MAE) and the mean absolute percentage error (MAPE). The rest 
of this article is structured: Part 2 of the research methodology, provides an overview of building prediction 
techniques that we propose. Section 3 describes the experimental results, validation and comparing with other 
LSTM models and deep learning models. Section 4 draws conclusions from the research that has been done. 


2. RESEARCH METHOD 

In this section, a short-term forecasting methodology for the next one day will be explained using 
the LSTM algorithm. We propose a three-step framework, namely, dataset preparation and preprocessing, 
LSTM algorithm construction and comparison machine learning (ML) algorithm, LSTM algorithm training 
and validation. In the following, we present an overview of the methodology, and an explanation for each 
component of the methodology in detail from the Indonesian electricity consumption dataset in this case 
study. 


2.1. Data preparation and preprocessing 

Forecasting this electrical load uses sequential data, which is related to past values. Past data on the 
use of electrical energy represents trends and load patterns as well as anomalies that occur. An exploratory 
analysis of the electrical load time series can be useful for identifying trends, patterns, and anomalies. 
Consumption patterns are grouped into daily, monthly and yearly electrical load consumption so that a 
graphical correlation can be seen between time and electricity usage. 


2.1.1. Dataset 

To find out the pattern of electricity consumption contained in this dataset, it is shown in 
Figures 1-3, the graph depicts the electricity load per day, per month and per year. Due to it is sequentially 
every hour, the total data in each year consists of 365 days or 8670 data. The daily load profile for 
24 hours is shown in Figure 1, load consumption has an upward trend starting from 08.00 hours to its peak at 
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11.00-15.00 hours and decreases again along with reduced population activity. The peak load occurred at 
18.00-20.00 with an increase in the load in housing and industry consisting of lighting and electronic 
equipment (65% of the load was housing load). Load consumption was reduced until the next following 
morning. From Figure 2, the hourly electricity consumption for one month obtained a lot of load outlier data 
in a few hours, this anomaly causes weather or humam errors, and will be overcome by preprocessing the 
load before being used as input for the maching learning model. In addition, Figure 3 illustrate the electrical 
consumption for one year. It infers that in the beginning of year, the electrical load slightly fluctuates. 
However, in the next following segment (middle to end of year) the electrical consumption relatively steady. 
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Figure 3. Yearly electrical load 


Forecasting electrical load data in this study is a short-term forecast. The data used is the electrical 
load per hour on the same day. For example, to predict the electrical load on Monday, the model reference 
data from the previous Monday. In order to compose the input data using historical value, the previous three 
periods was used. The use of too much historical data can cause multicollinearity problems so that the 
prediction results of the model become less sensitive. Figure 4 shows the process of data reshaping as input 
for the LSTM model. 


Figure 4. The used of three historical data period to forcast one next period 


2.1.2. Data pre-processing 
The dataset used in this study comes from Indonesian data from 2013-2017, from operator data 
recording of the high-voltage overhead line transmission system. In research using univariant datasets, we 
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only use active power (MW) ie. power usage or electricity consumption and reduce other data in the 
transmission operator’s records. Furthermore, data pre-processing is consists of imputing missing value, 
detrending to remove distortion in the form of an increase or decrease from normalized time series data. 
Detrending is a process that aims to remove trends from the time series. The trend usually refers to the 
average change over time. When performing the detrending process, aspects that may distort the data will be 
removed. Distortion in time series data can be seen as fluctuations in time-series graphs. By eliminating the 
distortion, the graph of the increase or decrease of the time series data can be seen. Figure 5(a) illustrate the 
raw data which contain an extreme fluctuation and may distor the training process. Thus, a detendring 
process was performed which illustrated in Figure 5(b). In this study normalization changes the feature range 
into the range [-1, 1] because this range is suitable for use on data that has outliers. The results of the 
normalization stage are carried out in a pre-processing stage, which is followed by data management, which 
includes data transformation and data splitting into training and testing data. Training and testing data are 
separated using an 80/20 ratio for training and testing respectively [25]. 
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Figure 5. Illustrate (a) observed data (b) detendring result 


2.2. Research methodology 

In this study, the electrical load forecasting process was carried out using the deep learning method. 
The deep learning method used is RNN with LSTM architecture. The LSTM network is a development of the 
RNN architecture. RNN architecture is a network that works for sequence or time series data by considering 
the output (information) of the previous process and storing the information for a short time (short-term 
memory). 

The LSTM architecture accepts input in the form of an Xt feature, where this feature is the value of 
the electrical load. This feature is then entered into the LSTM block for processing. The LSTM block 
receives input ht-1 (hidden state from the previous LSTM block), Ct-1 (Cell state from the previous LSTM 
block), and Xt (feature input). The LSTM block also has outputs in the form of h7t and Ct, namely the 
current hidden state and the current cell state (memory). In addition, in the LSTM block there are sigmoid 
and tanh activation functions, which act as input gates, forget gates and output gates. Several parameters are 
used to tune the LSTM model, such as the number of nodes and hidden layers. Figure 6 shows the research 
methodology of this research. 


2.3. Evaluation metrics 

To measure the model’s performance, an error rate was calculated using the MAPE. The lower the 
error rate, the higher the data accuracy. MAPE provides results in the form of absolute percentage averages 
with forecast errors compared to actual data. 
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Where, N is number of samples, Y; is actual value, and Yi is prediction value. 
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Figure 6. Research methodology 


3. RESULTS AND DISCUSSIONS 

In this study, testing was carried out with the distribution of the dataset of 80:20 for training data 
and validation data. Each dataset is trained using the LSTM model. The parameters used in this study include 
the number of hidden layers, the number of hidden nodes, the batch size of 4, epoch 100 and using the 
ADAM optimizer optimization method. 


3.1. LSTM 1 layer 

In LSTM architecture, layer 1 has 512 and 1024 hidden units, with Adam optimizer, and loss 
function mean squared error (MSE). The number of epochs for this layer is 100 with a batch size of 4. Units 
in the input and output layers are 1 unit. While the number of parameters generated in the LSTM 1 layer 
architecture is 1,053,185 and 4,207,617 parameters for 512 and 1024 respectively. The results of the single 
layer LSTM test are shown in Table 1. 


3.2. LSTM 2 layers 

In building 2 layers LSTM architecture, two LSTM blocks are stacked into one, thus forming a stack 
of LSTM blocks. The 2-layer LSTM architecture does not have much difference in the 1-layer LSTM 
architecture. Consists of 512 and 1024 hidden units, with Adam optimizer, and loss function MSE. The 
number of epochs and batch size for this layer is still the same, namely 100 and 4. Then the units in the input 
and output layers are | unit. However, because the number of layers is more, the parameters generated in the 
LSTM 2 layers architecture are also increased to 3,156.481 parameters for 512 nodes and 12,604,417 for 
1024 nodes. The results of the two-layer LSTM test are shown in Table 2. 


Table 1. LSTM one layer evaluation result 


Node MAPE Parameters 
512 9.37 1,053,185 
1024 9.29 4,207,617 


Table 2. LSTM two layers evaluation result 


Node MAPE Parameters 
512 9.28 3,156.481 
1024 9.04 12,604,417 
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3.3. LSTM 3 layers 

The 3 layers LSTM architecture is the same as the 2 layers LSTM architecture, but the number of 
stacks of LSTM blocks is 3 stacks. This architecture consists of 512 and 1024 hidden units, with the Adam 
optimizer, and the loss function MSE. The number of epochs and batch size for this layer is still the same, 
namely 100 and 4. Then the units in the input and output layers are 1 unit. However, because the number of 
layers is more, the parameters generated in this 3 layers LSTM architecture also increase to 5,257,729 
parameters. The results of the three-layer LSTM test are shown in Table 3. In order to evaluate the model’s 
ability to predict the forecasting pattern that has been carried out, we compared it with other models from the 
state of the art research [7] and [19], in the MAPE evaluation metrics which is summarized in Figure 7. 


Table 3. LSTM three layers evaluation result 


Node MAPE Parameters 
512 9.13 5,257,729 
1024 8.63 21,001,217 
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Figure 7. Comparison of MAPE value with other forecasting techniques (lower is better) 


4. CONCLUSION 

This study applies LSTM to predict the electrical load energy. Based on the experimental result the 
best architecture was LSTM three layers with 1024 nodes. It reaches 8.63 of MAPE value. The historical 
features used in this study was three previous period. 
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